转移学习(TL)已成为神经网络(NNS)的科学应用中的强大工具,例如天气/气候预测和湍流建模。 TL可以实现分布的概括(例如,参数外推)和有效的不同训练集(例如,模拟和观察值)的有效混合。在TL中,使用目标系统中的小数据集对已经训练的基础系统进行了训练的NN的选定层。对于有效的TL,我们需要知道1)重新培训的最佳层是什么? 2)在TL期间学到了哪些物理学?在这里,我们提出了新的分析和一个新的框架,以解决(1) - (2)的多种多数非线性系统。我们的方法将系统数据的光谱分析与卷积NN激活和内核的光谱分析相结合,从系统的非线性物理学来解释了TL的内部工作。使用几种2D湍流设置的亚网格尺度建模作为测试用例,我们表明,学习的内核是低,带和高通滤波器的组合,并且TL学习了新的过滤器,其性质与光谱差异一致基础和目标系统。我们还发现,在这些情况下,最浅的层是重新培训的最佳层,这违背了机器学习文献中指导TL的共同智慧。我们的框架根据物理和NN理论确定了事先重新训练的最佳层。这些分析共同解释了在TL中学到的物理学,并提供了一个框架,以指导TL,以在科学和工程中进行广泛的应用,例如气候变化建模。
translated by 谷歌翻译
Representing and synthesizing novel views in real-world dynamic scenes from casual monocular videos is a long-standing problem. Existing solutions typically approach dynamic scenes by applying geometry techniques or utilizing temporal information between several adjacent frames without considering the underlying background distribution in the entire scene or the transmittance over the ray dimension, limiting their performance on static and occlusion areas. Our approach $\textbf{D}$istribution-$\textbf{D}$riven neural radiance fields offers high-quality view synthesis and a 3D solution to $\textbf{D}$etach the background from the entire $\textbf{D}$ynamic scene, which is called $\text{D}^4$NeRF. Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively. Each ray sample is given an additional occlusion weight to indicate the transmittance lying in the static and dynamic components. We evaluate $\text{D}^4$NeRF on public dynamic scenes and our urban driving scenes acquired from an autonomous-driving dataset. Extensive experiments demonstrate that our approach outperforms previous methods in rendering texture details and motion areas while also producing a clean static background. Our code will be released at https://github.com/Luciferbobo/D4NeRF.
translated by 谷歌翻译
The traditional statistical inference is static, in the sense that the estimate of the quantity of interest does not affect the future evolution of the quantity. In some sequential estimation problems however, the future values of the quantity to be estimated depend on the estimate of its current value. This type of estimation problems has been formulated as the dynamic inference problem. In this work, we formulate the Bayesian learning problem for dynamic inference, where the unknown quantity-generation model is assumed to be randomly drawn according to a random model parameter. We derive the optimal Bayesian learning rules, both offline and online, to minimize the inference loss. Moreover, learning for dynamic inference can serve as a meta problem, such that all familiar machine learning problems, including supervised learning, imitation learning and reinforcement learning, can be cast as its special cases or variants. Gaining a good understanding of this unifying meta problem thus sheds light on a broad spectrum of machine learning problems as well.
translated by 谷歌翻译
Machine learning-based segmentation in medical imaging is widely used in clinical applications from diagnostics to radiotherapy treatment planning. Segmented medical images with ground truth are useful for investigating the properties of different segmentation performance metrics to inform metric selection. Regular geometrical shapes are often used to synthesize segmentation errors and illustrate properties of performance metrics, but they lack the complexity of anatomical variations in real images. In this study, we present a tool to emulate segmentations by adjusting the reference (truth) masks of anatomical objects extracted from real medical images. Our tool is designed to modify the defined truth contours and emulate different types of segmentation errors with a set of user-configurable parameters. We defined the ground truth objects from 230 patient images in the Glioma Image Segmentation for Radiotherapy (GLIS-RT) database. For each object, we used our segmentation synthesis tool to synthesize 10 versions of segmentation (i.e., 10 simulated segmentors or algorithms), where each version has a pre-defined combination of segmentation errors. We then applied 20 performance metrics to evaluate all synthetic segmentations. We demonstrated the properties of these metrics, including their ability to capture specific types of segmentation errors. By analyzing the intrinsic properties of these metrics and categorizing the segmentation errors, we are working toward the goal of developing a decision-tree tool for assisting in the selection of segmentation performance metrics.
translated by 谷歌翻译
Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks. By integrating external knowledge into PLMs, \textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained \underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to overcome the above-mentioned limitations. In this paper, we examine KEPLMs systematically through a series of studies. Specifically, we outline the common types and different formats of knowledge to be integrated into KEPLMs, detail the existing methods for building and evaluating KEPLMS, present the applications of KEPLMs in downstream tasks, and discuss the future research directions. Researchers will benefit from this survey by gaining a quick and comprehensive overview of the latest developments in this field.
translated by 谷歌翻译
Event cameras that asynchronously output low-latency event streams provide great opportunities for state estimation under challenging situations. Despite event-based visual odometry having been extensively studied in recent years, most of them are based on monocular and few research on stereo event vision. In this paper, we present ESVIO, the first event-based stereo visual-inertial odometry, which leverages the complementary advantages of event streams, standard images and inertial measurements. Our proposed pipeline achieves temporal tracking and instantaneous matching between consecutive stereo event streams, thereby obtaining robust state estimation. In addition, the motion compensation method is designed to emphasize the edge of scenes by warping each event to reference moments with IMU and ESVIO back-end. We validate that both ESIO (purely event-based) and ESVIO (event with image-aided) have superior performance compared with other image-based and event-based baseline methods on public and self-collected datasets. Furthermore, we use our pipeline to perform onboard quadrotor flights under low-light environments. A real-world large-scale experiment is also conducted to demonstrate long-term effectiveness. We highlight that this work is a real-time, accurate system that is aimed at robust state estimation under challenging environments.
translated by 谷歌翻译
Human Activity Recognition (HAR) is one of the core research areas in mobile and wearable computing. With the application of deep learning (DL) techniques such as CNN, recognizing periodic or static activities (e.g, walking, lying, cycling, etc.) has become a well studied problem. What remains a major challenge though is the sporadic activity recognition (SAR) problem, where activities of interest tend to be non periodic, and occur less frequently when compared with the often large amount of irrelevant background activities. Recent works suggested that sequential DL models (such as LSTMs) have great potential for modeling nonperiodic behaviours, and in this paper we studied some LSTM training strategies for SAR. Specifically, we proposed two simple yet effective LSTM variants, namely delay model and inverse model, for two SAR scenarios (with and without time critical requirement). For time critical SAR, the delay model can effectively exploit predefined delay intervals (within tolerance) in form of contextual information for improved performance. For regular SAR task, the second proposed, inverse model can learn patterns from the time series in an inverse manner, which can be complementary to the forward model (i.e.,LSTM), and combining both can boost the performance. These two LSTM variants are very practical, and they can be deemed as training strategies without alteration of the LSTM fundamentals. We also studied some additional LSTM training strategies, which can further improve the accuracy. We evaluated our models on two SAR and one non-SAR datasets, and the promising results demonstrated the effectiveness of our approaches in HAR applications.
translated by 谷歌翻译
Given a natural language that describes the user's demands, the NL2Code task aims to generate code that addresses the demands. This is a critical but challenging task that mirrors the capabilities of AI-powered programming. The NL2Code task is inherently versatile, diverse and complex. For example, a demand can be described in different languages, in different formats, and at different levels of granularity. This inspired us to do this survey for NL2Code. In this survey, we focus on how does neural network (NN) solves NL2Code. We first propose a comprehensive framework, which is able to cover all studies in this field. Then, we in-depth parse the existing studies into this framework. We create an online website to record the parsing results, which tracks existing and recent NL2Code progress. In addition, we summarize the current challenges of NL2Code as well as its future directions. We hope that this survey can foster the evolution of this field.
translated by 谷歌翻译
We investigate response generation for multi-turn dialogue in generative-based chatbots. Existing generative models based on RNNs (Recurrent Neural Networks) usually employ the last hidden state to summarize the sequences, which makes models unable to capture the subtle variability observed in different dialogues and cannot distinguish the differences between dialogues that are similar in composition. In this paper, we propose a Pseudo-Variational Gated Recurrent Unit (PVGRU) component without posterior knowledge through introducing a recurrent summarizing variable into the GRU, which can aggregate the accumulated distribution variations of subsequences. PVGRU can perceive the subtle semantic variability through summarizing variables that are optimized by the devised distribution consistency and reconstruction objectives. In addition, we build a Pseudo-Variational Hierarchical Dialogue (PVHD) model based on PVGRU. Experimental results demonstrate that PVGRU can broadly improve the diversity and relevance of responses on two benchmark datasets.
translated by 谷歌翻译
Despite impressive success in many tasks, deep learning models are shown to rely on spurious features, which will catastrophically fail when generalized to out-of-distribution (OOD) data. Invariant Risk Minimization (IRM) is proposed to alleviate this issue by extracting domain-invariant features for OOD generalization. Nevertheless, recent work shows that IRM is only effective for a certain type of distribution shift (e.g., correlation shift) while it fails for other cases (e.g., diversity shift). Meanwhile, another thread of method, Adversarial Training (AT), has shown better domain transfer performance, suggesting that it has the potential to be an effective candidate for extracting domain-invariant features. This paper investigates this possibility by exploring the similarity between the IRM and AT objectives. Inspired by this connection, we propose Domainwise Adversarial Training (DAT), an AT-inspired method for alleviating distribution shift by domain-specific perturbations. Extensive experiments show that our proposed DAT can effectively remove domain-varying features and improve OOD generalization under both correlation shift and diversity shift.
translated by 谷歌翻译